56 research outputs found

    Computers and drug discovery : construction and data mining of chemical and biological databases

    Get PDF
    In general, biological and chemical causes for harmful effects were studied through bioinformatics and cheminformatics efforts. A database of human genetic variants in G protein-coupled receptors was constructed, and differences between neutral and harmful variants were studied. A database of compounds with their mutagenicity data was constructed, and substructures were extracted that distinguish between Ames positive and Ames negative compounds. 6. Keywords (At most 10, in English), preferably from the thesaurus in use within your discipline. Do not use very general terms. cheminformatics, chemoinformatics, bioinformatics, databases, data mining, drug discovery, SNPs, polymorphisms, substructures.UBL - phd migration 201

    Flexible graph matching and graph edit distance using answer set programming

    Get PDF
    The graph isomorphism, subgraph isomorphism, and graph edit distance problems are combinatorial problems with many applications. Heuristic exact and approximate algorithms for each of these problems have been developed for different kinds of graphs: directed, undirected, labeled, etc. However, additional work is often needed to adapt such algorithms to different classes of graphs, for example to accommodate both labels and property annotations on nodes and edges. In this paper, we propose an approach based on answer set programming. We show how each of these problems can be defined for a general class of property graphs with directed edges, and labels and key-value properties annotating both nodes and edges. We evaluate this approach on a variety of synthetic and realistic graphs, demonstrating that it is feasible as a rapid prototyping approach.Comment: To appear, PADL 202

    Interpreting linear support vector machine models with heat map molecule coloring

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Model-based virtual screening plays an important role in the early drug discovery stage. The outcomes of high-throughput screenings are a valuable source for machine learning algorithms to infer such models. Besides a strong performance, the interpretability of a machine learning model is a desired property to guide the optimization of a compound in later drug discovery stages. Linear support vector machines showed to have a convincing performance on large-scale data sets. The goal of this study is to present a heat map molecule coloring technique to interpret linear support vector machine models. Based on the weights of a linear model, the visualization approach colors each atom and bond of a compound according to its importance for activity.</p> <p>Results</p> <p>We evaluated our approach on a toxicity data set, a chromosome aberration data set, and the maximum unbiased validation data sets. The experiments show that our method sensibly visualizes structure-property and structure-activity relationships of a linear support vector machine model. The coloring of ligands in the binding pocket of several crystal structures of a maximum unbiased validation data set target indicates that our approach assists to determine the correct ligand orientation in the binding pocket. Additionally, the heat map coloring enables the identification of substructures important for the binding of an inhibitor.</p> <p>Conclusions</p> <p>In combination with heat map coloring, linear support vector machine models can help to guide the modification of a compound in later stages of drug discovery. Particularly substructures identified as important by our method might be a starting point for optimization of a lead compound. The heat map coloring should be considered as complementary to structure based modeling approaches. As such, it helps to get a better understanding of the binding mode of an inhibitor.</p

    Discovering collectively informative descriptors from high-throughput experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Improvements in high-throughput technology and its increasing use have led to the generation of many highly complex datasets that often address similar biological questions. Combining information from these studies can increase the reliability and generalizability of results and also yield new insights that guide future research.</p> <p>Results</p> <p>This paper describes a novel algorithm called BLANKET for symmetric analysis of two experiments that assess informativeness of descriptors. The experiments are required to be related only in that their descriptor sets intersect substantially and their definitions of case and control are consistent. From resulting lists of n descriptors ranked by informativeness, BLANKET determines <b>shortlists </b>of descriptors from each experiment, generally of different lengths p and q. For any pair of shortlists, four numbers are evident: the number of descriptors appearing in both shortlists, in exactly one shortlist, or in neither shortlist. From the associated contingency table, BLANKET computes Right Fisher Exact Test (RFET) values used as scores over a plane of possible pairs of shortlist lengths <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. BLANKET then chooses a pair or pairs with RFET score less than a threshold; the threshold depends upon n and shortlist length limits and represents a quality of intersection achieved by less than 5% of random lists.</p> <p>Conclusions</p> <p>Researchers seek within a universe of descriptors some minimal subset that collectively and efficiently predicts experimental outcomes. Ideally, any smaller subset should be insufficient for reliable prediction and any larger subset should have little additional accuracy. As a method, BLANKET is easy to conceptualize and presents only moderate computational complexity. Many existing databases could be mined using BLANKET to suggest optimal sets of predictive descriptors.</p

    Comparative study of classification algorithms using molecular descriptors in toxicological databases

    Get PDF
    The rational development of new drugs is a complex and expensive process, comprising several steps. Typically, it starts by screening databases of small organic molecules for chemical structures with potential of binding to a target receptor and prioritizing the most promising ones. Only a few of these will be selected for biological evaluation and further refinement through chemical synthesis. Despite the accumulated knowledge by pharmaceutical companies that continually improve the process of finding new drugs, a myriad of factors affect the activity of putative candidate molecules in vivo and the propensity for causing adverse and toxic effects is recognized as the major hurdle behind the current "target-rich, lead-poor" scenario. In this study we evaluate the use of several Machine Learning algorithms to find useful rules to the elucidation and prediction of toxicity using ID and 2D molecular descriptors. The results indicate that: i) Machine Learning algorithms can effectively use ID molecular descriptors to construct accurate and simple models; ii) extending the set of descriptors to include 2D descriptors improve the accuracy of the models

    Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies

    Get PDF
    The dramatic increase in heterogeneous types of biological data—in particular, the abundance of new protein sequences—requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity—GPCRs and kinases from humans, and the crotonase superfamily of enzymes—we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships

    Site-Directed Mutations and the Polymorphic Variant Ala160Thr in the Human Thromboxane Receptor Uncover a Structural Role for Transmembrane Helix 4

    Get PDF
    The human thromboxane A2 receptor (TP), belongs to the prostanoid subfamily of Class A GPCRs and mediates vasoconstriction and promotes thrombosis on binding to thromboxane (TXA2). In Class A GPCRs, transmembrane (TM) helix 4 appears to be a hot spot for non-synonymous single nucleotide polymorphic (nsSNP) variants. Interestingly, A160T is a novel nsSNP variant with unknown structure and function. Additionally, within this helix in TP, Ala1604.53 is highly conserved as is Gly1644.57. Here we target Ala1604.53 and Gly1644.57 in the TP for detailed structure-function analysis. Amino acid replacements with smaller residues, A160S and G164A mutants, were tolerated, while bulkier beta-branched replacements, A160T and A160V showed a significant decrease in receptor expression (Bmax). The nsSNP variant A160T displayed significant agonist-independent activity (constitutive activity). Guided by molecular modeling, a series of compensatory mutations were made on TM3, in order to accommodate the bulkier replacements on TM4. The A160V/F115A double mutant showed a moderate increase in expression level compared to either A160V or F115A single mutants. Thermal activity assays showed decrease in receptor stability in the order, wild type>A160S>A160V>A160T>G164A, with G164A being the least stable. Our study reveals that Ala1604.53 and Gly1644.57 in the TP play critical structural roles in packing of TM3 and TM4 helices. Naturally occurring mutations in conjunction with site-directed replacements can serve as powerful tools in assessing the importance of regional helix-helix interactions

    Open Babel: An open chemical toolbox

    Get PDF
    Background: A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendorneutral formats. Results: We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. Conclusions: Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license fro
    • …
    corecore